Large Deviations and Full Edgeworth Expansions for Finite Markov Chains with Applications to the Analysis of Genomic Sequences

نویسندگان

  • Pierre Pudlo
  • P. PUDLO
چکیده

To establish lists of words with unexpected frequencies in long sequences, for instance in a molecular biology context, one needs to quantify the exceptionality of families of word frequencies in random sequences. To this aim, we study large deviation probabilities of multidimensional word counts for Markov and hidden Markov models. More specifically, we compute local Edgeworth expansions of arbitrary degrees for multivariate partial sums of lattice valued functionals of finite Markov chains. This yields sharp approximations of the associated large deviation probabilities. We also provide detailed simulations. These exhibit in particular previously unreported periodic oscillations, for which we provide theoretical explanations. Mathematics Subject Classification. 60J10, 60F10, 60J55, 92D20, 60F05. Received July 10, 2008. Revised March 3, 2009 and June 15, 2009. Introduction This paper is devoted to the determination of exact asymptotics of the probabilities of large deviations events for multidimensional additive functionals of finite Markov chains. A motivation that arises in molecular biology is the determination of under and over represented words in genomic sequences (DNA, RNA, and proteins), see Reinert et al. [28] for example. Words with unexpected frequencies in genomic sequences are natural candidates to represent biological signals. A well known example is the Chi motif in the sequence of Escherichia coli , namely the word GCTGGTGG, which is massively over represented and which plays a crucial role in the conservation of the genome of this bacterium. Of course, to detect words with unexpected frequencies, one must specify a stochastic model of the sequence. The most usual models are Markovian, that is, either one assumes that the sequence itself is Markov, or one uses hidden Markov models. Since hidden Markov chains can be represented as functionals of Markov chains, and since Markov chains of higher order are projections of simple Markov chains (that is, Markov chains of order 1) defined on product spaces, functionals of Markov chains and of hidden Markov chains of any order can be viewed as functionals of simple Markov chains. The distribution of the number of visits to a given state by a Markov chain is a much studied subject, in particular in a molecular biology context. For instance, Robin and Daudin [29], Régnier [25] and Stefanov et al. [33] provide exact formulas for these distributions. Nuel [23] turns the occurrences of any pattern over a

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Small time Edgeworth-type expansions for weakly convergent nonhomogeneous Markov chains

We consider triangular arrays of Markov chains that converge weakly to a diffusion process. Second order Edgeworth type expansions for transition densities are proved. The paper differs from recent results in two respects. We allow nonhomogeneous diffusion limits and we treat transition densities with time lag converging to zero. Small time asymptotics are motivated by statistical applications ...

متن کامل

Evaluation of First and Second Markov Chains Sensitivity and Specificity as Statistical Approach for Prediction of Sequences of Genes in Virus Double Strand DNA Genomes

Growing amount of information on biological sequences has made application of statistical approaches necessary for modeling and estimation of their functions. In this paper, sensitivity and specificity of the first and second Markov chains for prediction of genes was evaluated using the complete double stranded  DNA virus. There were two approaches for prediction of each Markov Model parameter,...

متن کامل

Empirical Bayes Estimation in Nonstationary Markov chains

Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical  Bayes estimators  for the transition probability  matrix of a finite nonstationary  Markov chain. The data are assumed to be of  a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...

متن کامل

تحلیل و آزمون عدم تقارن در رفتار سیاستگذاری پولی بانک مرکزی

According to Taylor (1993) rule, the monetary authority responds to deviations of output and of inflation from their targets through nominal interest rate fluctuations regarded as policy instrument. Another specification that has received considerable attention is that policymakers may have asymmetric preferences with regard to their objectives during recessions and expansions. Since according ...

متن کامل

Parry Expansions of Polynomial Sequences

We prove that the sum-of-digits function with respect to certain digital expansions (which are related to linear recurrences) and similarly defined functions evaluated on polynomial sequences of positive integers or primes satisfy a central limit theorem. These digital expansions are special cases of numeration systems associated to primitive substitutions on finite alphabets, the digits of whi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010